Refactor tiny-model generation scripts by qgallouedec · Pull Request #5637 · huggingface/trl

qgallouedec · 2026-04-24T18:31:51Z

Split tiny-model generation into per-model scripts

Replace the monolithic scripts/generate_tiny_models.py (one 437-line file, a big tuple-driven loop plus a growing if issubclass(...) ladder for VLMs) with a per-model layout:

scripts/generate_tiny_models/
├── _common.py                      # shared helpers: push_to_hub, smoke_test, print_config_diff, ...
├── for_causal_lm/                  # *ForCausalLM + GPT-2 LM head + PEFT + "small" vLLM variants
├── for_sequence_classification/    # reward models
└── for_conditional_generation/     # VLMs + T5 + Bart (encoder-decoder)

One script per tiny model. Each script is fully self-contained. Shared logic (push, smoke test, diff, weight init) stays in _common.py.

Why

The old file was bloated: 437 lines of tuple-driven loops stacked with one-off special cases for nearly every VLM (Qwen2.5-VL out_hidden_size, Qwen3-VL layer_types deletion, Qwen3.5 linear-attn fp32 cast, Gemma4 in-place mutation, llava-v1.6 dtype hotfix, …).
To re-push a single model you had to manually comment out all the others.
It didn't pin the transformers version
It didn't run any smoke test — a broken config only failed later when CI tried to load the tiny model.

With one file per model, each script reads top-to-bottom in 20–50 lines. Model-specific quirks stay scoped to the model that needs them.

New features added to every script

Smoke test. Before pushing, each script runs a minimal forward pass on a tiny dummy input. This catches config misspecification at generation time rather than when CI first imports the tiny model.
Config diff vs reference. print_config_diff(MODEL_ID, model) prints every flat-key difference between the reference Hub config and the tiny model's config before push. Makes it obvious when a shrink kwarg was silently ignored or when an unexpected field drifted.
Dtype pattern check. check_dtype_pattern(MODEL_ID, model) reads the reference safetensors header via the Hub API (no weight download) and flags any tensor whose dtype diverges from the reference — catches cases like models with mixed-precision weights (e.g. fp32 norms inside a bf16 checkpoint).
Exact transformers version pin. Each script declares TRANSFORMERS_VERSION = "X.Y.Z" and calls check_transformers_version(...) which raises unless the installed version matches exactly. The pinned value is max(introduction_version, trl_floor=4.56.2). Rationale: transformers is backward-compatible (a checkpoint saved by X loads on any ≥ X) but not forward-compatible; TRL CI runs against the floor, so tiny models must be saved with the oldest version that supports them to avoid config-field drift. Exact match prevents accidental regenerations with a newer transformers from silently breaking min-version CI.
--create-pr flag. When a tiny model already exists on the Hub, the default is to skip the push. Passing --create-pr opens a single PR against the existing repo instead (all artifacts bundled into one commit via HfApi.create_commit), so updates can be reviewed before landing on main.

How to run

# Regenerate a single tiny model (from repo root)
python -m scripts.generate_tiny_models.for_causal_lm.qwen3_for_causal_lm

# Open a PR against the existing Hub repo instead of skipping
python -m scripts.generate_tiny_models.for_causal_lm.qwen3_for_causal_lm --create-pr

See scripts/generate_tiny_models/README.md for full documentation.

Scope

This PR is refactor-only — the Python logic for each tiny model is preserved exactly. No tiny model on the Hub is regenerated by this PR; the existing Hub repos remain the source of truth for CI.

Follow-up PRs will use the new scripts to regenerate and push individual tiny models where the existing Hub checkpoint drifts from the reference (e.g. non-size config fields defaulting to wrong values, missing upstream-added fields, quantization parity, etc.). Each regeneration is one PR per tiny, with a refs/pr/N override in tests/conftest.py until merged on the Hub.

Note

Low Risk
Changes are confined to developer-facing Hub upload scripts and do not affect TRL runtime or test execution unless these scripts are manually run; main risk is accidental behavior drift when regenerating/pushing tiny models.

Overview
Refactors tiny-model generation by deleting the monolithic scripts/generate_tiny_models.py and replacing it with a package of per-model scripts under scripts/generate_tiny_models/ (grouped by for_causal_lm, for_sequence_classification, and for_conditional_generation).

Adds shared helpers in _common.py to enforce an exact transformers version pin per script, run a minimal forward-pass smoke test, compare dtype patterns against the reference safetensors metadata, print config diffs vs the reference Hub config, and push all artifacts in a single Hub commit with optional --create-pr behavior (skip by default if the repo exists). Documentation for running and version pinning is added in scripts/generate_tiny_models/README.md.

^{Reviewed by Cursor Bugbot for commit 36baabe. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-24T18:34:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 730c87629a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T18:39:09Z

+        inputs = processor.apply_chat_template(
+            conversation=messages,
+            add_generation_prompt=True,


Use non-chat fallback for VLM smoke inputs

smoke_test unconditionally calls processor.apply_chat_template(...) for every processor with an image_processor, but some VLM checkpoints in this refactor are not chat models (for example google/paligemma-3b-pt-224, whose script calls smoke_test(model, processor) in paligemma_for_conditional_generation.py). In that case apply_chat_template can fail due to missing/unsupported chat templates, so the script aborts before push even though the model itself is valid with regular processor(text=..., images=...) inputs.

Useful? React with 👍 / 👎.

I think we could address this in the future, google/paligemma-3b-pt-224 is this only vlm that doesn't have a chat template

…tion

cursor · 2026-04-27T19:24:59Z

+            return_dict=True,
+            return_tensors="pt",
+            padding=True,
+        ).to(device)


smoke_test crashes for PaliGemma lacking chat template

Medium Severity

The smoke_test VLM branch unconditionally calls processor.apply_chat_template(...) for any ProcessorMixin. The PaliGemma script passes its processor to smoke_test, but google/paligemma-3b-pt-224 has no chat template defined, so apply_chat_template will raise at runtime, making the PaliGemma generation script unusable. The PR discussion confirms PaliGemma is the only VLM without a chat template.

Additional Locations (1)

scripts/generate_tiny_models/for_conditional_generation/paligemma_for_conditional_generation.py#L45-L46

^{Reviewed by Cursor Bugbot for commit 5cc7fc8. Configure here.}

…tion

cursor · 2026-04-28T00:19:44Z

+
+config = AutoConfig.from_pretrained(MODEL_ID, text_config=text_config, vision_config=vision_config)
+model = PaliGemmaForConditionalGeneration(config).to(dtype=torch.float32)
+smoke_test(model, processor)


PaliGemma smoke_test crashes due to missing chat template

Medium Severity

smoke_test(model, processor) passes a ProcessorMixin to smoke_test, which takes the VLM branch and calls processor.apply_chat_template(...). PaliGemma (google/paligemma-3b-pt-224) has no chat template, so this call will crash at runtime. The PR discussion confirms this: "google/paligemma-3b-pt-224 is the only VLM that doesn't have a chat template." The smoke_test function needs a fallback path for VLM processors that lack a chat template.

Additional Locations (1)

scripts/generate_tiny_models/_common.py#L61-L85

^{Reviewed by Cursor Bugbot for commit 4730fec. Configure here.}

albertvillanova

Thanks.

You replaced 450 code lines with 2,765. Are you sure this is the right direction?

On the other hand, what if we want to regenerate all models? Or some specific family of models? After this PR there is no simple way other than running each model individually.

The pin of transformers for the lower supported version is repeated in every script. Why not defaulting to the lower supported version if no explicit version is set for a specific model?

Also, I think create_pr should be the default.

qgallouedec · 2026-04-28T19:40:33Z

You replaced 450 code lines with 2,765. Are you sure this is the right direction?

Honestly, in this case, I don't think it's a big deal. Generally speaking, these scripts are written and run just once, and then we keep them mostly for reference. It's not something that requires a lot of maintenance.

On the other hand, what if we want to regenerate all models?

This is how I see things:

	Is it common?	Old approach	New approach
Generate one model	Yes, very	Impossible	Easy
Generate all models	No, I never did that	Easy	Not hard

the old approach was designed to generate all models. But in practice we never generate all models.

Why not defaulting to the lower supported version if no explicit version is set for a specific model?

right, I'll do this

Also, I think create_pr should be the default.

In practice, I often needed to run the script a few times to iterate on and align the configs before the diff is in a good state. Only once everything looks right do I actually want to open the PR.

Maybe there is a confusion: when create_pr isn't set, it doesn't commit on main, it just stays local.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 385ac24. Configure here.}

New tiny model generation

730c876

cursor Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread scripts/generate_tiny_models/for_conditional_generation/gemma3_for_conditional_generation.py

Comment thread tests/conftest.py Outdated

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

qgallouedec added 6 commits April 24, 2026 18:51

cohere and fix vocab size

a060e6d

print pr

158b891

precommit

f5eedfb

precommit

ffbf3b1

cohere2

d24a76c

deepseek v3

f0f5563

cursor Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread scripts/generate_tiny_models/_common.py

Comment thread scripts/generate_tiny_models/_common.py Outdated

qgallouedec added 2 commits April 24, 2026 20:21

revert to keep this focused

59cb16e

nit

9bc6ad4

cursor Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread scripts/generate_tiny_models/_common.py Outdated

qgallouedec added 4 commits April 24, 2026 20:28

revert

a7ad64a

revove force and update readme

6b361e1

nit commit message

b2cf603

better

b4bae78

cursor Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread scripts/generate_tiny_models/for_causal_lm/peft_qwen3_for_causal_lm.py Outdated

qgallouedec and others added 5 commits April 24, 2026 20:52

fix generation config peft

0b7fa20

Merge branch 'main' into new-tiny-model-generation

538f486

Qwen3.6 integration (#5642)

39bafd4

Release: v1.3 (#5647)

07e65d7

⬆️ Bump dev version (#5648)

7198c14

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread tests/test_chat_template_utils.py

qgallouedec and others added 4 commits April 27, 2026 16:21

Add Qwen3.6 model generation script with updated configuration

71b8219

merge main

545e5e9

Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…

db13f29

…tion

Merge branch 'main' into new-tiny-model-generation

5cc7fc8

qgallouedec requested a review from albertvillanova April 27, 2026 19:19

qgallouedec requested review from AmineDiro and albertvillanova April 27, 2026 19:19

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into new-tiny-model-genera…

7f25397

…tion

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread scripts/generate_tiny_models/for_causal_lm/qwen3_for_causal_lm.py

Qwen3 Instruct-2507

4730fec

cursor Bot reviewed Apr 28, 2026

View reviewed changes

qgallouedec mentioned this pull request Apr 28, 2026

Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config #5638

Open

8 tasks

albertvillanova reviewed Apr 28, 2026

View reviewed changes

qgallouedec mentioned this pull request Apr 28, 2026

Add Nemotron 3 to tests via tiny model #5278

Open

5 tasks

Merge branch 'main' into new-tiny-model-generation

4c4f843

qgallouedec mentioned this pull request Apr 29, 2026

CI often fails with torch.OutOfMemoryError: CUDA out of memory #5207

Closed

qgallouedec added 2 commits April 29, 2026 11:08

Merge branch 'main' into new-tiny-model-generation

6a8be8f

Merge branch 'main' into new-tiny-model-generation

385ac24

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread scripts/generate_tiny_models/for_conditional_generation/t5_for_conditional_generation.py Outdated

rm smoke test for enc dec

36baabe

Conversation

qgallouedec commented Apr 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Split tiny-model generation into per-model scripts

Why

New features added to every script

How to run

Scope

Uh oh!

HuggingFaceDocBuilderDev commented Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

qgallouedec Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Apr 27, 2026

Choose a reason for hiding this comment

smoke_test crashes for PaliGemma lacking chat template

Uh oh!

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

PaliGemma smoke_test crashes due to missing chat template

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Apr 28, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Apr 24, 2026 •

edited by cursor Bot

Loading